FIGURE 1 


TGGCCTCCCCAGCTTGCCAGGCACAAGGCTGAGCGGGAGGAAGCGAGAGGCATCTA 

AGCAGGCAGTGTTTTGCCTTCACCCCAAGTGACCATGAGAGGTGCCACGCGAGTCTC 

AATCATGCTCCTCCTAGTAACTGTGTCTGACTGTGCTGTGATCACAGGGGCCTGTGA 

GCGGGATGTCCAGTGTGGGGCAGGCACCTGCTGTGCCATCAGCCTGTGGCTTCGAGG 

GCTGCGGATGTGCACCCCGCTGGGGCGGGAAGGCGAGGAGTGCCACCCCGGCAGCC 

ACAAGGTCCCCTTCTTCAGGAAACGCAAGCACCACACCTGTCCTTGCTTGCCCAACC 

TGCTGTGCTCCAGGTTCCCGGACGGCAGGTACCGCTGCTCCATGGACTTGAAGAACA 

TCAATTTTTA£GCGCTTGCCTGGTCTCAGGATACCCACCATCCTTTTCCTGAGCACAG 

CCTGGATTTTTATTTCTGCCATGAAACCCAGCTCCCATGACTCTCCCAGTCCCTACAC 

TGACTACCCTGATCTCTCTTGTCTAGTACGCACATATGCACACAGGCAGACATACCT 

CCCATCATGACATGGTCCCCAGGCTGGCCTGAGGATGTCACAGCTTGAGGCTGTGGT 

GTGAAAGGTGGCCAGCCTGGTTCTCTTCCCTGCTCAGGCTGCCAGAGAGGTGGTAAA 

TGGCAGAAAGGACATTCCCCCTCCCCTCCCCAGGTGACCTGCTCTCTTTCCTGGGCCC 

TGCCCCTCTCCCCACATGTATCCCTCGGTCTGAATTAGACATTCCTGGGCACAGGCTC 

TTGGGTGCATTGCTCAGAGTCCCAGGTCCTGGCCTGACCCTCAGGCCCTTCACGTGA 

GGTCTGTGAGGACCAATTTGTGGGTAGTTCATCTTCCCTCGATTGGTTAACTCCTTAG 

TTTCAGACCACAGACTCAAGATTGGCTCTTCCCAGAGGGCAGCAGACAGTCACCCCA 

AGGCAGGTGTAGGGAGCCCAGGGAGGCCAATCAGCCCCCTGAAGACTCTGGTCCCA 

GTCAGCCTGTGGCTTGTGGCCTGTGACCTGTGACCrTCTGCCAGAATTGTCATGCCTC 

TGAGGCCCCCTCTTACCACACTTTACCAGTTAACCACTGAAGCCCCCAATTCCCACA 

GCTTTTCCATTAAAATGCAAATGGTGGTGGTTCAATCTAATCTGATATTGACATATTA 

GAAGGCAATTAGGGTGTTTCCTTAAACAACTCCTTTCCAAGGATCAGCCCTGAGAGC 

AGGTTGGTGACTTTGAGGAGGGCAGTCCTCTGTCCAGATTGGGGTGGGAGCAAGGG 

ACAGGGAGCAGGGCAGGGGCTGAAAGGGGCACTGATTCAGACCAGGGAGGCAACT 

ACACACCAACATGCTGGCTTTAGAATAAAAGCACCAACTGAAAAAA 


FIGURE 2 

MRGATRVSIMLLLVTVSDCAVITGACERDVQCGAGTCCAISLWLRGLRMCTPLGREGEE 
C 

HPGSHKWFFRKRKHHTCPCLPNLLCSRFPDGRYRCSMDLKNINF 


Important features: 

Signal peptide: 
1-19 


N-myristoylation sites: 
33 
35 
46 



FIGURE 3A 

XXXXXXXXXXXXXXX (Length = 1 5 amino acids) 
Comparison Protein XXXXXYYYYYYY (Length = 12 amino acids) 

% amino acid sequence identity = 

(the number of identically matching amino acid residues between the two polypeptide sequences 
as determined by ALIGN-2) divided by (the total number of amino acid residues of the PRO 
polypeptide) = 



5 divided by 15 = 33.3% 


FTGURE 3B 


PRO XXXXXXXXXX (Length = 10 amino acids) 

Comparison Protein XXXXXYYYYYYZZYZ (Length = 1 5 amino acids) 

% amino acid sequence identity = 

(the number of identically matching amino acid residues between the two polypeptide sequences 
as determined by ALIGN-2) divided by (the total number of amino acid residues of the PRO 
polypeptide) = 

5 divided by 10 = 50% 


PRO-DNA NNNNNNNNT^^ (Length = 14 nucleotides) 

Comparison DNA NNNNNNLLLLLLLLLL (Length = 16 nucleotides) 

% nucleic acid sequence identity = 

(the number of identically matching nucleotides between the two nucleic acid sequences as 
determined by ALIGN-2) divided by (the total number of nucleotides of the PRO-DNA nucleic 
acid sequence) = 



6 divided by 14 = 42.9% 


FTGURE 3D 


PRO-DNA NNhWNNNNNNNN (Ungth= 12 nucleotides) 

Comparison DNA NNNNLLLW (Length = 9 nucleotides) 

% nucleic acid sequence identity = 

(the number of identically matching nucleotides between the two nucleic acid sequences as 
determined by ALIGN-2) divided by (the total number of nucleotides of the PRO-DNA nucleic 
acid sequence) = 

4 divided by 12 = 33.3% 


FIGURE 4 


TGGCTCCCCAGCTTGCCAGGCACAAGGCTGAGCTGGAGGAAGCGAGANGCATCTAA 
GCAG 

GCAGTGTTTTGCCTTCACCCCAAGTGACCATGAGAGGTGCCACGCGAGTCTCAATCA 
TGC 

TCCTCCTAGTAACTGTGTCTGACTGTGCTGTGATCACAGGGGCCTGTGAGCGGGATG 
TCC 

AGTGTGGGGCAGGCACCTGCTGTGCCATCAGCCTGTGGCTTCGAGGGCTGCGGATGT 
GCA 

CCCCGCTGGGGCGGGAAGGCGAGGAGTGCCACCCCGGCAGCCACAAGGTCCCCTTC 
TTCA 

GGAAACGCAAGCACCACACCTGTCTTGTTGCCCAACCTGCTGTGCTCCAGTTCCGGA 
CGG 

CAGTACGCTGCTCA 



FIGURE 5B 
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FIGURE 6C 




FIGURE 7 A 



FIGURE 7B 



FIGURE 8 



FIGURE 10 A 



FIGURE 10 B 


3.0 > 
2.5 


<y 



Bound (picomolar) 


FIGURE 10 C 



FIGURE 11 A 



Total (picomolar) 


Kd= 0.95 +/- 0.6 nM 
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FIGURE 14 A 
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FIGURE 14 B 
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FIGURE 15 
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FIGURE 16 A-C 


TCGCCTCCCCAQCTTGCCAQGCACAAGGCTGAGCGGGAGGAAGCGAGAGG 50 
CATOTMGCftGGCAGTGTTTTGCCTTCACCCCMGTGACCATGAGAGGTC 


CX^ACGCGAGTCTCAATCATGCTCCTCCTAGTAACTSTGTCTCACTCTGCT 
ATRVSIMLLLVT VSDC A 
GTGATCACAGGGGCCICTGAGCGGGATGTCCAGTGTGGGGCAGGCACCTC 200 

VITGACERDVQCGAGTC 
CTGTGCCATCAGCCTGTGGCTTCGAGGGCTG<X^ATGrK^ 

CAIS LWLRGLRMCTPL 
GGCGGGAAGGCGAGGAGTGCCACCCCGGCAGCCACAAGGTCCCCTTCTTC 
GREGEECHPGSHKVPPF 
AGGAAACGCAAGCACCACACCTGTCXTTGCTTGCC^ 

RKRKHHTCPCIiPNLLCS 
CAGGTTCCCGGACGGCAGGTACCX5CTGCTCCATGGACTTO 400 

R F PDGRYRCSMDLKNI 
ATTTTTAGGCGCTTGCCTGGTC TCAGGATACCCA CCATCCTTTTCCTGAG 
N P * 

CACAGCCTGGATTITTATTTCTGCCATGAAACCCAGCTCCCA 

CCAGTCCCTACACTGACTACCCTCATCTCTCTTGTCTAGTACGCACATAT 

GC AC AC AGGCAGAC ATACCTCC CATCATGACATGGTCCCC AGGCTCGC CT 600 

GAGGATGTCACAGCTTGAGGCTGTGGTGIXJAAAGGTG^^ 

TC1TCCCTGCTCAGGCTGCCAGAGAGGTGGTAAATGGCAGAAAGGACATT 

CCCCCTCCCCTCCCCAGGTGACCTGCTCTCTT^ 

TC CC CACATGTA TC CC TC GGTCTGAATTAG AC ATTC CTGGGC ACAGGCTC 800 

TTGGGTGCATTGC1X^GAGTCCCAGGTCCTGG<XTGACCCTCAGGCCCTT 

CA<XITGAGGTCTGTGAGGACCAATTTGTGGGTAGTTCATC 

TGGTTAACTCCTTAGTTTCAGACCACAGACTCAAGATira 

AGGGCAGCAGACAGTCACCCCAAGGCAGGTGTAGGGAGCCCAGGGAGGCC 1000 

AATCAGCCCCCTGAAGACTCTX^TCX^CAGTCAGCCTGTGGCTTC 

GTGACCTGTGACCTTCTGCXAGAATTGTCATGCCTCTGAGGCCCCCTC^ 

ACCACACTTTACCAGTTAACCACTGAAGCCCCCAATTCrcAC^ 

CATTAAAATGCAAATGGTGGTGGTTCAATCTAATCTGATATTGACATATT 1200 

AGAAGG CAA TTAGGGTGTTTCCTTAAAC AACTCCTTTC CAAGGATC AGCC 

CTGAGAGCAGGTTGGTGACTTTGAGGAGGGCAGTCCTCTC 

GGTGGGAGCAAGGGACAGGGAGCAGGGCAGGGGCTGAAAGGGGCACTGAT 

TCAGACCAGGGAGGCAACTACACACCAACATGCTGGCTTTAGAATAAAAG 1400 

CACCAACTGAAAAAA 


MRGATRVS IMLLLVTVSDC WITGACTOE V 2CG ft 3TO 
MLLLLLLLPPLLLPRAGM WITGAdPlfc S 2CG Z ^2 


WITCACERIIUXXMSTCCAVS 
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FIGURE 17 A 
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FIGURE 17 B 


normalized luciferase activity 
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FTGIJRE 20 A 


♦ C-C increased from 12 to 15 

♦ Z is average of EQ 

* B is average of ND 

* match with stop is _M; stop-stop = 0; J (joker) match - 0 
•/ 

#define _M -8 f* value of a match with a stop */ 

int _day[26][26] - { 

/* ABCDEFGH1JKLMNOPQRSTUVWXYZ'/ 

/* A */ { 2, 0,-2, 0, 0,-4, 1,-1,-1, 0,-1,-2,-1, 0J*. 1, 0,-2, 1,1,0, 0,-6, 0,-3, 0}, 

/* B */ { 0, 3,-4, 3, 2,-5, 0, 1,-2, 0, 0,-3,-2, 2,_M,-1, 1, 0, 0, 0, 0,-2,-5, 0,-3, 1}, 

/* C */ {-2,-4,15,-5,-5,-4,-3,-3,-2, 0,-5, -6,-5,-4, _M,-3, -5,-4, 0,-2, 0,-2,-8, 0, 0,-5}, 

/* D •/ {0, 3,-5, 4, 3,-6, 1,1,-2, 0, 0,-4,-3, 2,_M,-1, 2,-1, 0, 0, 0,-2,-7, 0,-4, 2}, 

t* E •/ {0, 2,-5, 3, 4,-5, 0, 1,-2, 0, 0,-3,-2, l,_M,-l, 2,-1, 0, 0, 0,-2,-7, 0,-4, 3}, 

/* F V {-4,-5,-4,-6,-5, 9,-5,-2, 1, 0,-5, 2, 0,-4,_M,-5,-5,-4,-3,-3, 0,-1 , 0, 0, 7,-5}, 

/* G V { 1, 0,-3, 1, 0,-5, 5,-2,-3, 0,-2,-4,-3. 0,_M,-l,-l,-3, 1, 0, 0,-1,-7, 0,-5, 0}, 

/* H V {-1, 1,-3, 1, 1,-2,-2, 6,-2, 0, 0,-2,-2, 2,_M, 0, 3, 2,-1,-1, 0,-2,-3, 0, 0, 2}, 

V I ♦/ {-1,-2,-2,-2,-2, 1,-3,-2, 5, 0,-2, 2, 2,-2,_M,-2,-2,-2,-l, 0, 0, 4,-5, 0,-1,-2}, 

/♦ J*/ { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0JV1, 0, 0. 0, 0, 0, 0, 0, 0, 0, 0, 0}, 

/* K */ {-1, 0,-5, 0, 0,-5,-2, 0,-2, 0, 5,-3, 0, 1J4.-1, 1, 3, 0, 0, 0,-2,-3, 0,-4, 0}, 

/* L */ {-2,-3,-6,-4,-3, 2,-4,-2, 2, 0,-3, 6, 4,-3, _M,-3, -2,-3,-3,-1, 0, 2,-2, 0,-1,-2}, 

/* M •/ {-1,-2,-5,-3,-2, 0,-3,-2, 2, 0, 0, 4, 6,-2 JW,-2,-l, 0,-2,-1, 0, 2,^1, 0,-2,-1}, 

/♦ N •/ { 0, 2,-4, 2, 1,-4, 0, 2,-2, 0, 1,-3,-2, 2,_M,-1, 1, 0, 1, 0, 0,-2,^, 0,-2, 1}, 

/♦ O V LM^M,_M^M^M,_M^M,_.M,_M,_M^M,_M,^.M,_M, 0,_M^M,_M,.M,_M^M,_M^M,_M^M}, 

/♦ P */ { 1,-1,-3,-1,-1,-5,-1, 0,-2, 0,-l,-3,-2,-l,_M, 6, 0, 0, 1, 0, 0,-1,-6, 0,-5, 0}, 

/♦Q*/ { 0, 1,-5, 2, 2,-5,-1, 3,-2, 0, 1,-2,-1, 1,_M, 0, 4, 1,-1,-1, 0,-2,-5, 0,-4, 3}, 

/• R */ {-2, 0,-4,-1,-1,-4,-3, 2,-2, 0, 3,-3, 0, 0^_M, 0, 1, 6, 0,-1, 0,-2, 2. 0,-4, 0), 

/* S V { 1, 0, 0, 0, 0,-3, 1,-1,-1 , 0, 0,-3,-2, 1,_M, 1,-1, 0, 2, 1, 0,-1.-2, 0,-3, 0}, 

/* T */ { 1, 0,-2, 0, 0,-3, 0,-1, 0, 0, 0,-1,-1, 0,_M, 0,-1,-1, 1, 3, 0, 0,-5, 0,-3, 0}, 

/* U V { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,_M, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}, 

/* V •/ { 0,-2,-2,-2,-2,-1,-1,-2, 4, 0,-2, 2, 2,-2^M,-l, -2,-2,-1. 0, 0, 4,-6, 0,-2,-2}, 

/* w */ { _6 ,-5,-8,-7,-7, 0,-7,-3,-5, 0,-3,-2,-4,-4^,-6,-5, 2,-2,-5, 0,-6,17, 0. 0,-6}, 

/* X ♦/ { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0J4, 0. 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}, 

/• Y */ {-3,-3, 0,-4,-4, 7,-5, 0,-1, 0,-4,-1, -2,-2,_M,-5,-M,-3,-3, 0,-2, 0, 0,10,-4}, 

f* Z */ {0, 1,-5, 2, 3,-5, 0, 2,-2, 0, 0,-2,-1, 1,_M, 0, 3, 0, 0, 0, 0,-2,-6, 0,-4, 4} 

}; 
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FTGURE 20 B 


/* 

V 

#indude <stdio.h> 
#incliide <ctype.h> 

#defme MAXJMP 16 

#define MAXGAP 24 

#define JMPS 1024 

Adeline MX 4 

^define DMAT 3 

Adeline DMIS 0 

#define DINSO 8 

ffdefine DINS1 1 

#dcfine PINSO 8 

#define PINS1 4 


/* max jumps in a diag */ 

/* don't continue to penalize gaps larger than this */ 
/* max jmps in an path */ 

/* save if there's at least MX-1 bases since last jmp •/ 

/* value of matching bases •/ 

/* penalty for mismatched bases */ 

/* penalty for a gap */ 

/* penalty per base ♦/ 

/* penalty for a gap */ 

/* penalty per residue */ 


struct jmp { 

short 

unsigned short 

}; 

struct diag { 
int 
long 
short 


struct jmp 


n[MAXJMP]; /* size of jmp (ncg for dely) */ 

x[MAXJMP]; /* base no. of jmp in seq x */ 
/* limits seq to 2 A 16 -1 */ 

score; /* score at last jmp */ 

offset; /* offset of prev block •/ 

ijmp; /* current jmp index */ 

jp; /* list of jmps V 


}; 


struct path { 

int spc; /* number of leading spaces V 

short n[JMPS]; /* size of jmp (gap) */ 

Int x[JMPS]; /* loc of jmp (last elem before gap) */ 

}; 

char *ofile; f* output file name */ 

char *namex[2]; /* seq names: getseqsO */ 

char *prog; /* prog name for err msgs */ 

char *seqx[2]; /• seqs: getseqsO V 

Int dmax; /• best diag: nwO •/ 

int dmaxO; /* final diag */ 

Int dna; I* set if dna: rnainQ */ 

mt endgaps; /* set if penalizing end gaps */ 

Int gapx,gapy, /* total gaps in seqs ♦/ 

int len0,lenl; /♦ seq lens*/ 

Int ngapx, ngapy; /* total size of gaps */ 

int smax; /* max score: nwO •/ 

int ♦xbm; /* bitmap for niatching */ 

i 0ng offset; /* current offisct in jmp file */ 

struct diag *dx; /* holds diagonals */ 

struct path pp[2]; /* holds path for seqs •/ 

char *calloc0, *mallocO, *index()> "strcpyO; 

char *getseq0, *g_calloc(); 
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FIGURE 20 C 

I* Necdleman-Wunsch alignment program 
* 

* usage: progs file ! fi1c2 

* where fiJel and Bid are two dna or two protein sequences. 

* The sequences can be in upper- or lower-case an may contain ambiguity 

* Any lines beginning with *>' or '<* are ignored 

* Max file length is 65535 (limited by unsigned short x in the jmp struct) 

* A sequence with 1/3 or more of its elements ACGTU is assumed to be DNA 

* Output is in the file M aIign.out H 
* 

* The program may create a tmp file in /tmp to hold info about traceback. 

* Original version developed under BSD 4.3 on a vax 8650 
*/ 

^include "nwJT 
include "day-h* 

static _dbval[26] = { 

1,14^13,0,0,4,11,0,0,12,0,3,15,0,0,0,5,6,8,8,7^,0,10,0 

}; 



static j)bval[26] = { 

1, 2|(l«('iy-'A'))|(l«CNVA')), 4, 8, 16, 32, 64, 
128, 256, OxFFFFFFF, 1«10, 1«U, l«12,l«13, 1«14, 
1«15, 1«16, 1«17, 1«18, 1«19, 1«20, l«21, l«22, 
1«23, 1«24, l«25i(I«( , E , -'A , ))!Cl«^Q ,J A , )) 


main(ac, av) main 
int ac; 
char •avQ; 

{ 

prog • av[0); 
lf(ac!=3){ 

fprintrXstderr, "usage; %s filel file2\n", prog); 

fj}rintf(stderr, n where filel and file2 are two dna or two protein sequences.\n"); 
fprintf(stderr, w The sequences can be in upper- or lower-case\n n ); 
fi>rintf(stdGrT,"Any lines beginning with V or '< are ignored\n"); 
r^rintf(stderr,' , Output is in the file ralign-outrVn"); 
exit(l); 

} 

namex[0] « avfl]; 

namex[I] = av[2]; 

seqx[0] - getseq(namex[0], &len0); 

seqx[ 1 ] = getseq(naraex[I ], &lenl); 

xbm - (dna)? .dbval : _pbval; 

endgaps = 0; /* 1 to penalize endgaps */ 

ofile « •align-out* 1 ; /* output ftte */ 

nwO; /* fill in the matrix, get the possible jmps •/ 

readjrnpsO; I* get the actual jmps */ 

printQ; f* print stats, alignment */ 


cleanup(0); /* unlink any tmp files •/ 

} 
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FIGURE 20 D 


/* do the alignment, return best score: mamO 

* dna: values in Fitch and Smith, PNAS, 80, 1382-1386, 1983 

* pro: PAM 250 values 

* When scores are equal, we prefer mismatches to any gap, prefer 

* a new gap to extending an ongoing gap, and prefer a gap in seqx 

* to a gap in seq y. 
V 

nwO 
{ 


char 

*px,*py; 
*ndely, *dery; 

/* seqs and ptrs */ 

int 

/♦keep track of dely */ 

tot 

ndelx, delx; 

/* keep track of delx */ 

int 

*tmp; 

/* for swapping rowO, rowl •/ 

int 

mis; 

/* score for each type */ 

Int 

insO, insl; 

/* insertion penalties */ 

register 

id; 

f* diagonal index V 

register 

y; 

/*jrap index •/ 

register 

*col0, *coll; 

f* score for curr, last tow */ 

register 

xx, yy; 

/* index into seqs ♦/ 


dx - (struct diag *)g_caUoc( M to get diags", len(Wenl+1, srzeof(struct diag)); 

ndely = (int *)g_calloc("to get ndely", lenl+1, sizeof(int)); 
dely - (irit ♦)g_calloc( ,, to get dely", lenl+l, stoeof(lnt)); 
colO = (int *)gj»UocCto get colO", Ienl+1, sizeofpnt)); 
coll = (int *)g^callocOo get coll", lenl+1, sizeof(int)); 
insO « (dna)? DINSO : PINSO; 


ins I = (dna)? DINS 1 ;PINS1; 

smax^ -10000; 
If (endgaps) { 

for (colO[0] « dely[0] ■» -insO, yy - 1 ; yy <- len 1 ; yy++) { 
colO[yy] * dely[yy] «= colO[yy- 1] - insl ; 
ndelyfyy] « yy, 

> 

co!0[0] - 0; /♦ Waterman Bull Math Biol 84 V 


for (yy - 1 ; yy <= lenl ; yy++) 
delyfyy] = -insO; 

/• fill in match matrix 
♦/ 

for (px = seqx[0], xx= 1; xx<= lenO; px++, xx++) { 
/* initialize first entry in col 



*/ 

if (endgaps) { 


if(xx«=l) 


coll[0] 


delx c -(ins<Hinsl); 


else 


colt[0] 


delx «colO[0]- insl; 


ndelx = xx; 


else{ 


coll[0] = 0; 
delx « -insO; 
ndelx = 0; 


} 
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FTGURE 20 E 


for (py = scqx[l], yy = 1 ; yy <= lenl ; py++, yy++) { 
nris = col0[yy-l]; 
If(dna) 

mis (xbm[*px- , Al&xbm[*py- , AD? DMAT : DMIS; 

else 

mis += jiayPpx-'A^py-' A']; 

/* update penalty for del in x scq; 

♦ favor new del over ongong del 

• ignore MAXGAP if weighting endgaps 
V 

if (endgaps H ndely[yy] < MAXGAP) { 

If (col0[yy] - insO >= dely[yy]) { 

delytyy] = colO[yy] - (insOHnsl); 
ndcly[yy] = 1; 

}else{ 

dely[yy]-=insl; 
ndelybr]^; 

} 

}else{ 

if (col0[yy] - (insOHnsl) >= dery[yy]) { 

dely[yy] - col0[yy] - (insO+insl); 
ndely[yy] = 1; 

} else 

ndely[yy]++; 

} 

/* update penalty for del in y seq; 
* favor new del over ongong del 
♦/ 

if (endgaps || ndelx < MAXGAP) { 

if (colltyy-1] - insO >= delx) { 

delx = coll[yy-l] - (insOHnsl); 
ndelx = 1; 

}else{ 

delx -=» insl ; 
ndelx++; 

} 

} else{ 

If (coll[yy-l] - (ins(H-insl) >= delx) { 

delx « coll[yy-l] - (insO+insl); 
ndelx = 1; 

} else 

ndelx++; 

} 

/* pick the maximum score; we're favoring 
* mis over any del and delx over dely 
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FTGURE 20 F 


...nw 

id = xx - yy + lenl - 1; 

If (mis >= delx &&mis>= delyfyy]) 

coll[yy]«=mis; 
else if (delx >-delytvy]){ 

coll [yy] = delx; 

ij = dx[id].ijmp; 

if (dx[id].jp.n[0] && (!dna || (ndelx >- MAXJMP 
&& xx > dx[id].jp.x[ij]+MX) || mis > dx[id].score+DINSO)) { 
dx[id].ijmp++; 
If (++ij >= MAXJMP) { 
writejmps(id); 
ij = dx[id].ijmp - 0; 
dx[id].offset - offset; 

ofFset +- slzeof(struct jmp) + sizeof(oflset); 

} 

} 

dx[id].jp.n[ij] = ndelx; 
dx[id].jp.x[ij] = xx; 
dx[id].score = delx; 

} 

else{ 

co11[yy] = dely[yy]; 
ij = dx[id].ijmp; 

if (dx[id].jp.n[0] && (!dna || (ndely[yy] >= MAXJMP ^niucnft / 

&&. xx > dx[id].jp.x[ij]+MX) [| mis > dx[id].score+D!NS0)) { 

• dx[id].ijmp++; 
if >= MAXJMP) { 
writejmps(id); 
ij = dx[id].ijmp='0; 
dx[id].offset» offset; 

offset +« slzeof(stnictjmp) + sizeof(of&et); 

} 

dx[id].jp Ji[ij] - -ndely[yy]; 
dx[id].jp.x[ij] = xx; 
dx[id]. score = dely[yy]; 

If (xx = leaO&&yy<lenl) { 
f* last col 
*/ 

If (endgaps) 

coll [yy] insO+ins 1 *(\en 1 -yy); 
lf(coll[yy]>smax){ 

smax = coll[yy]; 

dmax = id; 

} 

} 

if (endgaps && xx < lenO) 

coll[yy-l] — insO+insl*(lenO-xx); 
tf(coll[yy«l]>smax){ 

smax-=coll[yy-l]; 

dmax^id; 

Imp = colO; colO - coll; coll = tmp; 

} 

(void) free((cliar *)ndely); 
(void) free((char *)dely); 
(void) firebar *)col0); 

(void) free{(cliar *)coll);} Page 4 of nw.c 
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* 

* printO - only routine visible outside this module 
* 

* static: 

* getmatO - trace back best path, count matches: printO 

* pr_align0 - print alignment of described in array pQ: printO 

* dumpblockO - dump a block of lines with numbers, stars: pr_align0 

* numsO - put out a number line: dumpblockO 

* putlineO - put out a line (name, [num], seq, [num]): dumpblockO 

* starsO - -put a line of stars: dumpblockO 

* stripnameO - strip any path and prefix from a seqname 


^include "nw.h" 


#defineSPC 3 

#deflne P_LIME 256 /* maximum output line */ 

#define P SPC 3 /* space between name or num and seq */ 

extern _day(26][26]; 

tnt olen; /* set output line length •/ 

FILE *6c; /♦ output file */ 


printO 

int lx, ly, firstgap, lastgap; /♦overlap*/ 

if ((fx = fopen(ofile, "w")) = 0) { 

ft>rintftstderr, H %s: can't write %s\n", prog, ofile); 
cleanup(l); 

rprintf(fx, "<first sequence: %s (length = %d)\n", namex[0], lenO); 
fprintf(fx, "<second sequence: %s (length - %d)\n u , namex[ll, lenl); 
olen * 60; 
be - lcnO; 
ty = lenl; 

firstgap = lastgap = 0; 

if (dmax < len 1 - 1 ) { /• leading gap in x*/ 

pp[0].spc = firstgap « lenl • dmax - 1; 
ly-=pp{0]-spc; 

else If (dmax > lenl - I) { A* leading gap m y */ 
pp[l].spc - firstgap = dmax - (lenl • 1); 
lx-=pp[l]-spc; 

If (dmaxO < lenO - I) { /* trailing gap in x •/ 
lastgap - lenO - dmaxO -I; 
lx -= lastgap; 

elself (dmax0>len0- 1){ P trailing gap in y */ 
lastgap = dmaxO - (lenO - 1); 
ly -= lastgap; 

} 

getmatfjx, ly, firstgap, lastgap); 
pr_alignQ; 
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/• 

• trace back the best path, count matches 
*/ 

static 

getmatflx, ly, firstgap, lastgap) 


getmat 


int lx, ry; 

Int firstgap, lastgap; 


/* "core" (minus endgaps) */ 
/* leading trailing overlap */ 


int 
char 
double 
register 


nm, iO, il,sizO, sizl; 

outx[32]; 

pet; 

nO,nl; 


register char *pO,*pl; 

/* get total matches, score 
•/ 

i0 = il =sizO = sizl =0; 
pO-seqx[0] + pp[l].spc; 
pl=seqx[l} + pp[0],spc; 
nO = pp[l].spc + l; 
nl =pp[0].spc+ 1; 

nm = 0; 

while (♦p0&&*pl){ 
If (sizO) { 

pl++; 
nl-hf; 
sizO-; 

} 

elscif(siz1){ 
p0++; 
n(R+; 
sizl-; 

} 

else{ 

If (xbrnf^pO-'ATtobml^pl-W]) 

nm++; 
If(n0++=»pp[0].x[i0]) 

sizO«pp[0]ji[i(H+]; 
if(nl++ = pp[l].x[il]) 

sizl-pp[l].n[il++]; 

pO++; 

pi-H-; 


/* pet homology: 

* if penalizing endgaps, base is the shorter seq 

* else, knock off overhangs and take shorter core 
♦/ 

If (endgaps) 

lx « (lenO < len 1 )? lenO : lenl ; 

else 

bc«(lx<ly)?lx:ly; 
pet - 100.*(double)nm/(double)lx; 
fprintf(fx, "\n"); 

fprintf(fx, "<%d ma£ch%s in an overlap of %d: %.2f percent S!milarity\n ,, , 
nm, (nm = 1 )? t -es M , lx, pet); 
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fprintf(fe, "<gaps in tat sequence: %d", gapx); 
If (gapx) { 

(void) sprintfToutx, * (%d %s%s) n , 

ngapx, (dna)? "base":"residue", (ngapx 1)? ""rV); 
fprintf(fit,"%$", outx); 

fprintf(fx, ", gaps in second sequence: %d", gapy); 
lf(gapy){ 

(void) sprintf(outx, " (%d %s%s)", 

ngapy* (dna)? "base" ^residue", (ngapy = 1)? "":"s"); 
fp^intf(£x,"%s , ^ outx); 


} 

if (dna) 


fprintflffo, 

"\n<scorc: %d (match - %d, mismatch = %d, gap penalty ■ %d + %d pa basc)\n\ 
smax, DMAT, DM1S, DINSO, DINS!); 


fprintf(fx, 

"\n<score: %d (Dayhoff PAM 250 matrix, gap penalty = %d + %d per residue)\n'\ 
smax.PINSO, PINS I); 
if (endgaps) 

fprintfCfx, , 
"<endgaps penalized, left endgap: %d %s%s, right endgap: %d %s%s\n M , 
firstgap, (dna)? "base* : "residue*, (fostgap = 1)? : "s", 
lastgap, (dna)? "base'* : "residue", (lastgap — 1)? "" : "s"); 


else 


} 


rprintf(fx, "<endgaps not penalized\n H ); 


static qid; f* matches in core - for checking */ 

static Imax; /* lengths of stripped file names */ 

static ij [2]; /* jmp index for a path */ 

static nc[2]; /* number at start of current line V 

static ni[2]; /* current elern number - for gapping */ 

static siz[2J; 

static char *ps[2]; /* ptr to current element */ 

static char *po[2]; /* ptr to next output char slot •/ 

static char out[2][P_LINE]; /* output line */ 

static char star[P_LINE]; /* set by starsQ •/ 

/* 

* print alignment of described in struct path ppQ 
•/ 

static 

pr_align() 
1 

int nn; f* char count */ 

int more; 
register i; 

for(i= 0.1max = 0;i<2;i-H-){ 

nn = stripname(namex[i]); 
If (on > Imax) 

Imax = nn; 

nc[i]-l; 
ni[i]-l; 
sizfi] = ij[i] = 0; 
ps[i] = seqx[i]; 
po[i] = outfi]; 

J 
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for (nn «= nm = 0, more = l;morc;){ -.pralign 
for(i = more = 0;i<2;i++){ 
f* 

* do we have more of this sequence? 
♦/ 

if(!'ps[i]) 

continue; 

more++; 

if (pp[i]-spc) { /• leading space */ 
♦po[i]++ = '\ 
pp[i].spc-; 

} 

elself(sizti]){ /* in a gap*/ 
♦poli^-'- 1 ; 
sizp]-; 

) 

else { f* we're putting a seq element 

*/ 

•po[Q = *psH; 

if(islower(«ps[i])) 

*psli] = toupper(*ps[r|); 

po[i]++; 
ps[i]++; 

/* 

* are we at next gap for this seq? 
*/ 

if(ni[i] = pp[i].x[ijp]]){ 
f 

* we need to merge all gaps 

♦ at mis location 
•/ 

siz[i] = pp[i].n[ijH-H-]; 
while (ni[i] — pp[i]a[ij[i]]) 

} 

ni[i]++; 

> 

if(++nn = olen|| Imore && nn) { 
dumpblockO; 
for(i = 0;i<2;i++) 
po[i]«outffl; 

nn-=0; 

} 


* dump a block of lines, including numbers, stars: pr_align() 
*/ 

Static Hnmnhlnrk 

dumpblockO 
{ 

register i; 

for(i = 0;i<2;H-+) 
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.~dumpblock 

(void)putc( , \n , ,fx); 
for (i - 0; i < 2; { 

if (*outfi] && Coutfi] !- ' ' || •(poti]) !-")){ 
if(i = 0) 

nums(i); 
if(i=*0&& •outtl]) 
starsO; 

putline(i); 

lfCi = 0&&»out[l]) 

fprintf(fx, star); 

if fi—1) 

nums(i); 

} 

} 

} 

f* 

* put out a number line: dumpbtockO 
*/ 

static 

nums(ix) uums 
tot ix; /* index in outQ holding seq line */ 

{ 


char nlinc(P_LlNE]; 

register i,j; 

register char *pn, *px, *py; 

for (pa = nline, i * 0; i < hnax+P^SPC; pirH-) 
♦pn «= ' '; 

for (i =• nc[ix], py - outfix); *py; py++, pn++) { 

lf(*py = "[|*py=-) 
♦pn = * 

e1se{ 

If 0%10 0 || (i « 1 && nc[ix] !« 1)) { 
j = (i<0)?-i:i; 

for(px = |»;j;j/ := 10»PX-) 
•px^lO + ^O 1 ; 

If (i<0) 


} 

else 

i++; 


} 


} 

} 

•pn-W; 
nc[ix] = i; 

for (pn = nline; *pn; pn++) 

(void) pute(*pn, fic); 
(void) putcOn', fx); 


/• 

* put out a line (name, [num], seq, [num]): dumpblockO 
*/ 

static 

putline(ix) putline 
int ix; 

{ 
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...pntline 

int i; 
register char *px; 

for (px - namexpx], i - 0; *px && g px != px++, 

(vold)putc(*px, fx); 
for 0 i < lmax+P_SPC; i++) 

(void) putcC 1 , fe); 

/* these count from 1 : 

* ni[] is current element (from 1) 

• ncQ is number at start of current line 
•/ 

for (px = out[ix]; ♦px; px++) 

(void) putc(*px&0x7F, fx); 
(void)putcf\n\rx); 

J 


/* 

* put a line of stars (seqs always in out[0], out[l]): dumpblodcO 
*/ 

static 

starsQ 
{ 

int i; 

register char *p0, *pl , cx, *px; 

if (!*out[0] || (*out[0] *{po[0]) — • ') |i 

!*out[I] || (*out[l] — ' ' && *(po[l]) — 1 *)) 
return; 

px = star, 

for (i - lmax+P_SPC; i; i-) 

for (pO = out[0], pi -00(1]; *pO && *pl ; pO++, pl++) { 
If (isalpha(*pO) && isalpha(*pl)) { 

tf (xbm[*pa- , Al&xbfn[*pl-W]) { 


} 

else if (fdna && _da>fpO- , A'][*pl. , A , J > 0) 
cx ** V; 

else 

cx = "; 

} 

else 

cx = "; 
*px++»cx; 

} 

♦px++ = V; 
*px = f \0'; 


stare 
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/* 

♦ strip path or prefix from pn, return len: pr_a1ign() 
*/ 

static 

stripname(pn) stripname 
char *pn; /* file name (may be path) ♦/ 

{ 

register char *px, *py; 
py = 0; 

for (px = pn; *px; px++) 
If (*!» — '/) 

py = px+ 1; 

ir(py) 

(void) strcpy(pn, py); 
retnrn(strlen{pn)); 

} 
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* cleanupO - cleanup any tmp file 

* getseqO - read in seq, set dna, len, maxlen 

* g_callocO - callocO with error checkin 

* readjmpsO - get the goodjmps, from tmp file if necessary 

* writejmpsO - write a filled array of jmps to o trap file: nwO 
•/ 

#include "nw.h" 
#indade<sys/file.b> 


char ♦jname =* w /trnp/horngXXXXXX rt ; 

FILE ♦(}; 

int cleanupO; 

long IscekQ; 


/* tmp file for jmps •/ 
/* cleanup tmp file */ 


* remove any tmp file if wc blow 
♦/ 

cleanup(i) 

Int i; 

{ 


if(fj) 
exit(i); 


(void) unlink(jname); 


cleanup 


/• 

* read, return ptr to seq, set dna, len, maxlen 

* skip lines starting with '<*. ™ %>% 

* seq in upper or lower case 
•/ 

char 

getseq(file, len) 

char *file; /* file name */ 
Int •len; t* seq len */ 


{ 


char 

register char 

int 

FILE 


line[1024], •pseq; 

natgc, tlen; 
♦fp; 


getseq 


if((rp = fopen(file,V))«-0){ 

rprintf(stderr, ,, %s: cant read %s\n", prog, file); 
exit(l); 

} 

tlen = natgc = 0; 

while (fgets(lrne, 1024, tp)) { 

If (*iine = V II *Kne ~ '<* II *l» e ■= 

continue; 
for (px = line; *px !- V; px++) 

If (isupper(*px) || islower(*px)) 
tlerrH-; 

if ((pseq = rnalloc<(imslgned)(tlen+6))) — 0) ( 

fprintfl[stderr, w %s: mallocO failed to get %d bytes for %s\n\ prog, tlen+6, file); 
exit(l); 

pseq[0] - pscq[l] = pseq[2] « pseq[3] » 
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-getseq 


} 


py = pseq + 4; 
*len = tlen; 
rewind{rp); 

while (fgcts(linc, 1024, fp)) { 

if (nine = V i| •line == '<* || *line = V) 

continue; 
for (px = line; *px != V; px++) { 

If (isupper(*px)) 

*py++«*px; 

d$eif(islower('px)) 

*py++ = toupper(*px); 

if(irciexfATGCUV(py-l))) 
natgc++; 

} 

} 

♦py+^W; 
♦py-W; 
(void) fclose(fp); 
dna = natgc > (tlen/3); 
return (pseq+4); 


char • 

g_calloc(msg, nx, sz) 

char *msg; 
Int nx, sz; 


{ 


char 


g_calloc 


/* program, calling routine */ 
/* number and size of elements */ 


*px, *calloc<); 


If ((px = calloc((unsigned)nx, (unslgned)sz)) «■ 0) { 
if(*msg){ 

fprmtfilstderr, "%s: g^callocQ failed %s (n=^%d, sz=%d)\n", prog, msg, nx, sz); 
exit(l); 

} 

} 

rcturn(px); 


• get final jmps from dxQ or tap file, set ppQ, reset drnax: mamO 
*/ 

rcadjmpsO 
{ 


readjmps 


int 
Int 

register i,j, xx; 


fd = -l; 
siz,iO,il; 


} 


(void) fclose(fj); 

if ((fd - openflname, 0_RDONLY, 0)) < 0) { 

fprintf(stdcrr, "Vos: can't openO %s\n", prog, jname); 
cleanup 1); 

) 


for(i-K) = il =0, dmaxO = dmax, xx = lenO; ; i++) { 
while (1){ 

for 0 = dx[dmax].ijmp; j >« 0 && dx[dmax].jp.x01 >* xx; j~) 
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...readjmps 

If (j < 0 && dx[drnax].ofifeet && © { 

(void) lseck(fd, dx[dmax].ofifset, 0); 
(void) read(fd, (char *)&dx[dmax].jp, sizeof(struct jmp)); 
(void) read(fd, (char *)&dx[dmax].offsct, sizcof[dxtdmax].offect)); 
dx[dmax].ijmp = MAXJMP-1; 

I 

else 

break; 

} 

if(i>-JMPS){ 

rprintftstderr, "%s: too many gaps in alignment^", prog); 
cleanup 1); 

} 

if(j>~0){ 

siz = dx[drnax].jp.n|j]; 
xx = dx[dmax].jp.x[j]; 
dmax +» siz; 

if (siz < 0) { /* gap in second seq •/ 

pp[l].n[il] = -siz; 
xx •+= siz; 

/* id - xx - yy + lenl - 1 
*/ 

pp[l]jc[il] = xx - dmax + lenl - 1; 
gapy++; 
ngapy — siz; 
/* ignore MAXGAP wben doing endgaps */ 

Siz - (-siz < MAXGAP |] endgaps)? -siz : MAXGAP; 
il++; 

} 

else if (siz > 0) { /* gap in first seq */ 
pp[0].n[iO] - siz; 
pp[0].x[iO]«xx; 
gapx++; 
ngapx += siz; 
/* ignore MAXGAP when doing endgaps */ 

siz - (siz < MAXGAP || endgaps)? siz : MAXGAP; 
i0++; 

} 

} 

else 

break; 

} 

t* reverse the order of jmps 
♦/ 

for (j - 0, iO--; j < iO; j++, iO-) { 

i - pp[0].nfl]; pp[0].n(j3 = pp[0].n[i0]; pp[0].n[iO] - i; 
i = pp[0] Jt[j]; pp[0].x[j] = pp[0].x[i0]; pp[0].x[i0] = i; 

} 

forG = 0JI-;j<il;j++,il-){ 

i - PrtlWft VrflMD - PP[i]n[il]; P p[l Mil] = i; 
i - PP[l]-x[j]; PP[l]xD] = pp[ll.x[il]; pp[l].x[il] - i; 

} 

lf(fd>=0) 

(void) closc(fd); 

(void) unlink(jname); 

fj-0; 

offset = 0; 

} 
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/* 

♦ write a filled jmp struct offset of the prev one (if any): nwO 
*/ 

writejrnps(ix) writejmps 
Int ix; 

{ 

char *mktempO; 

if (rnktempQname) < 0) { 

tprintfUstderr, "%s: can't mktempO %s\n", prog, jname); 
cleanup(l); 

\ 

If ((fj = fopen(jname, "w")) — 0) { 

fprintflstdea, "%s: can't write %s\n w , prog, jname); 
exit(l); 

} 

} 

(void) fwrite((char *)&dx[ix] jp, sizeof(struct jmp), 1, fj); 
(void) fwrite((char ♦)&dx[ix].offset, sizeof(dx[ix]. offset), 1, fj); 

} 
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